Fetch the data from NewsroomDB

NewsroomDB is the Tribune's proprietary database for tracking data that needs to be manually entered and validated rather than something that can be ingested from an official source. It's mostly used to track shooting victims and homicides. As far as I know, CPD doesn't provide granular data on shooting victims and the definition of homicide can be tricky (and vary from source to source).

We'll grab shooting victims from the shootings collection.


In [1]:
import os
import requests

# A big object to hold all our data between steps
data = {}

def get_table_url(table_name, base_url=os.environ['NEWSROOMDB_URL']):
    return '{}table/json/{}'.format(os.environ['NEWSROOMDB_URL'], table_name)

def get_table_data(table_name):
    url = get_table_url(table_name)
    
    try:
        r = requests.get(url)
        return r.json()
    except:
        print("Request failed. Probably because the response is huge.  We should fix this.")
        return get_table_data(table_name)

data['shooting_victims'] = get_table_data('shootings')

print("Loaded {} shooting victims".format(len(data['shooting_victims'])))


Loaded 11511 shooting victims

Filter to only year-to-date shooting victims


In [2]:
from datetime import date, datetime

def get_shooting_date(shooting_victim):
    return datetime.strptime(shooting_victim['Date'], '%Y-%m-%d')

def shooting_is_ytd(shooting_victim, today):
    try:
        shooting_date = get_shooting_date(shooting_victim)
    except ValueError:
        if shooting_victim['RD Number']:
            msg = "Could not parse date for shooting victim with RD Number {}".format(
                shooting_victim['RD Number'])
        else:
            msg = "Could not parse date for shooting victim with record ID {}".format(
                shooting_victim['_id'])
        
        print(msg)
        return False
        
    return (shooting_date.month <= today.month and
            shooting_date.day <= today.day)

today = date(2016, 3, 30)
#today = date.today()

# Use a list comprehension to filter the shooting victims to ones that
# occured on or before today's month and day.
# Also sort by date because it makes it easier to group by year
data['shooting_victims_ytd'] = sorted([sv for sv in data['shooting_victims']
                                       if shooting_is_ytd(sv, today)],
                                      key=get_shooting_date)


Could not parse date for shooting victim with RD Number HX448309
Could not parse date for shooting victim with record ID 560bc169db573e1c2c67789e
Could not parse date for shooting victim with record ID 565d8490389ce82a2a5b07dc
Could not parse date for shooting victim with record ID 56d6c55e389ce82a2a5b09ac
Could not parse date for shooting victim with record ID 536b0f4edb573e257039a258
Could not parse date for shooting victim with record ID 53693edc389ce83e25cd4823
Could not parse date for shooting victim with record ID 536cf216db573e256fa3af22
Could not parse date for shooting victim with record ID 53ac49c8389ce835c90b18b9
Could not parse date for shooting victim with record ID 536cf773389ce835c8d88b28
Could not parse date for shooting victim with record ID 5421c1c1db573e3dc9db2e98
Could not parse date for shooting victim with RD Number HX445856
Could not parse date for shooting victim with RD Number HX447455
Could not parse date for shooting victim with RD Number HY182250
Could not parse date for shooting victim with record ID 552c0a0f389ce8650e9a9916
Could not parse date for shooting victim with record ID 55c79ce6389ce865f1892777
Could not parse date for shooting victim with RD Number HY369178
Could not parse date for shooting victim with record ID 565d882edb573e070ae4c259
Could not parse date for shooting victim with record ID 565da430389ce82a2bd86b3b
Could not parse date for shooting victim with record ID 56e09073389ce82a2a5b09d1

Group shooting victims by year


In [3]:
import itertools

def get_shooting_year(shooting_victim):
    shooting_date = get_shooting_date(shooting_victim)
    return shooting_date.year

data['shooting_victims_ytd_by_year'] = []

for year, grp in itertools.groupby(data['shooting_victims_ytd'], key=get_shooting_year):
    data['shooting_victims_ytd_by_year'].append((year, list(grp)))

Count the victims by year


In [4]:
data['shooting_victims_ytd_by_year_totals'] = [(year, len(shooting_victims))
                                               for year, shooting_victims
                                               in data['shooting_victims_ytd_by_year']]

In [5]:
import csv
import sys

writer = csv.writer(sys.stdout)
writer.writerow(['year', 'num_shooting_victims'])

for year, num_shooting_victims in data['shooting_victims_ytd_by_year_totals']:
    writer.writerow([year, num_shooting_victims])


year,num_shooting_victims
2012,533
2013,393
2014,294
2015,422
2016,727

Spot-check our numbers


In [6]:
shooting_victims_2016 = next(shooting_victims
                             for year, shooting_victims
                             in data['shooting_victims_ytd_by_year']
                             if year == 2016)
num_shooting_victims_2016 = next(num_shooting_victims
                                 for year, num_shooting_victims
                                 in data['shooting_victims_ytd_by_year_totals']
                                 if year == 2016)
today = date.today()
num_shootings = 0
for shooting_victim in shooting_victims_2016:
    num_shootings += 1
    shooting_date = get_shooting_date(shooting_victim)
    assert shooting_date.year == 2016
    assert shooting_date.month <= today.month
    assert shooting_date.day <= today.day
    
assert num_shootings == num_shooting_victims_2016

In [ ]: